August 15, 2014

~1.4$ billion profit

-17$ million loss

-90$ million loss

MillionDollar$tory

Is it possible to leverage screenplay information to predict movie profitability and assist the descision-making process for screenplay selection?

Scope of the problem

  • A. There is a high linear relationship between movie budget and box office revenue.
  • B. It is a lot more interesting to predict the profitabilty of a movie.

The workflow of MillionDollar$tory



Predicting the profitability of a movie

  • Some features that were tried but failed to produce convincing results: readability index, sentiment analysis, tf-idf, tf-idf with POS tagging.

  • word2vec - Efficient Estimation of Word Representations in Vector Space (published by Google).

  • Allows to cluster words with similar meaning. These clusters can be used as features in a predictive model.

\[ \hat{y} = x_{budget} + \sum_{i=1}^{n} x_{i, word2vec} \]

Predicting the profitability of a movie

  • Used 10-fold cross-validation to benchmark a variety of regression models for profitability prediction: linear regression (LR), MARS, SVM, Random Forest (RF), Generalized boosting (GBM) and CART.

About me: Thomas Vincent

Project Toolkit

word2vec

  • Relies on neural network to learn words and concepts in text documents.
  • Similar words arrange themselves near each other in this high-dimensional vector space.
  • Allows one to solve simple analogies by performing arithmetic on the word vectors and examining the nearest words in the vector space.

Movie analytics

  • The data also offers the opportunity for some sentiment analysis of movie scripts.

rmarkdown::render("MillionDollarStory_Presentation.Rmd")